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Abstract 

The article derives a novel Gram-Charlier A (GCA) Series based Extended Rule-of-Thumb 
(ExROT) for bandwidth selection in Kernel Density Estimation (KDE). There are existing 
various bandwidth selection rules achieving minimization of the Asymptotic Mean Inte¬ 
grated Square Error (AMISE) between the estimated probability density function (PDF) 
and the actual PDF. The rules differ in a way to estimate the integration of the squared sec¬ 
ond order derivative of an unknown PDF (/(•)), identified as the roughness R(f"(-)). The 
simplest Rule-of-Thumb (ROT) estimates the R(f"(j) with an assumption that the density 
being estimated is Gaussian. Intuitively, better estimation of R(f"(j) and consequently 
better bandwidth selection rules can be derived, if the unknown PDF is approximated 
through an infinite series expansion based on a more generalized density assumption. As a 
demonstration and verification to this concept, the ExROT derived in the article uses an 
extended assumption that the density being estimated is near Gaussian. This helps use of 
the GCA expansion as an approximation to the unknown near Gaussian PDF. The ExROT 
for univariate KDE is extended to that for multivariate KDE. The required multivariate 
AMISE criteria is re-derived using elementary calculus of several variables, instead of Ten¬ 
sor calculus. The derivation uses the Kronecker product and the vector differential operator 
to achieve the AMISE expression in vector notations. There is also derived ExROT for 
kernel based density derivative estimator. 

Keywords: Kernel Density Estimation (KDE), Kernel bandwidth parameter selection, 
Kernel smoothing parameter selection, Gram-Charlier A Series, Multivariate Kernel Den¬ 
sity Estimation, Kernel Density Derivative Estimation 


1. Introduction 


Continuous probability density function (PDF) estimation using kernel methods is widely 


used in statistics, machine learning and signal processing (Silverman 1986). The optimal 


estimation depends upon the selected kernel function and its spread decided by the smooth¬ 
ing or bandwidth parameter. The selection of kernel has limited impact on optimal PDF 


estimation, although Epanechnikov kernel is the most optimal kernel (Epanechnikov, 1969). 


On the other hand, the optimal value of bandwidth parameter avoids either too rough or 
too smooth estimation of an unknown PDF. 

There exist variety of rules for bandwidth selection in KDE. The rules vary based 
on the criteria to measure accuracy of density estimation and to satisfy the used criteria. 
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The brief survey of data driven bandwidth selectors is provided by Wand and Jones (1994); 
Park and Marron (1990|); Jones and Sheather (1996); Park and Turlach (1992); Sheather 


(1986b). The Asymptotic Mean Integrated Square Error (AMISE) between the estimated 
PDF and the actual PDF is the most widely used performance criteria to derive the rules, 
though there are many others. The AMISE criteria requires estimating the roughness of 
the squared second order PDF (R(f "(•))) as a prior step to estimate the kernel bandwidth 
parameter, where the roughness of a function is defined as integration of the squared func¬ 
tion. The rules, based on the AMISE as a performance criteria, differ in a way to estimate 
the R(f"(-)). The simplest Rule-of-Thumb (ROT), satisfying the AMISE, by Silverman 


(1986) assumes Gaussian distribution for the unknown density. It is not the most optimal 
bandwidth selector but is used either as a very fast reasonably good estimator or as a first 
estimator in multistage bandwidth selectors. More precise solve-the-equation plug-in rules 
(Sheather, 1983, 1986a) use estimation of integrated squared density derivative functionals 
to estimate R(f"(-)). They demand high computations to solve a non-linear equation using 
iterative methods. They use ROT as a very first estimate. The fastest e-exact approxima¬ 


tion algorithm based solve-the-equation plug-in rule (Raykar and Duraiswami, 2005, 2006) 


requires 0(N + M ) order of computations, where N is number of samples and M is the 
selected number of evaluation points. So, deriving a data dependent bandwidth parameter 
selection rule for KDE that balances accuracy and computation is the goal of this article. 


The article achieves this goal by deriving an Extended Rule-of-Thumb (ExROT). The 
assumption about Gaussianity of an unknown PDF in ROT is too restrictive. Expressing 
an unknown PDF, in terms of an infinite series of higher order statistics, based on a more 
generalized assumption should result into a better bandwidth selection rule. As a verifica¬ 
tion and demonstration to this concept, the ExROT extends the Gaussian assumption in 
ROT to near Gaussian assumption. This facilitates use of cumulants based Gram-Charlier 
A (GCA) Series expansion as an approximation for the unknown PDF to satisfy the same 
AMISE criteria. The empirical simulations prove that the ExROT for bandwidth selec¬ 
tion is better than the ROT, with respect to an integrated mean square error (IMSE) or 
MISE performance criteria, for all types of nongaussian unimodal distributions including 
the skewed, the kurtotic and with outlier distributions. This is achieved with computation 
comparable to the ROT and too less compare to the e-exact solve-the-equation plug-in rule. 


The ExROT for bandwidth selection in univariate KDE is extended to the simi¬ 
lar for multivariate KDE and kernel based multivariate density derivative estimation. The 
ExROT for multivariate KDE requires multivariate expression for AMISE, multivariate 
Taylor Series expansion, multivariate Hermite polynomials and multivariate GCA Series 
expansion. The required multivariate AMISE is conventionally derived using gradient and 
Hessian of the PDF of a random vector (Wand and Jones 1994). Conventionally, the other 
required multivariate expressions are also derived using Tensor calculus, as higher order 
derivatives of a multivariate functions are involved. Often, the corresponding final expres¬ 
sions requires coordinatewise representations. But recently, the higher order cumulants 


(Terdik. 

2002] 

S. Rao Jammalamadaka and Terdik, 

May, 2006 

) and multivariate Hermite 

polynomials ([Terdik 

2002 Holmquist 

1996] are derived using only elementary calculus of 


several variables. This is achieved by replacing conventional multivariate differentiations by 
repeated applications of the Kronecker product to vector differential operator. The derived 
expressions are also more elementary as using vector notations and more comprehensive as 
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apparently more nearer to their counterparts in univariate. The same approach has been 
used here to derive multivariate AMISE criteria in a vector notations. Overall, the multi¬ 
variate ExROT is derived using the multivariate Taylor Series, the multivariate cumulants 


and the multivariate Hermite polynomials derived by Terdik (2002); Holmquist (1996), the 


multivariate GCA derived by Bhaveshkumar (2015a) and the multivariate AMISE obtained 
in Section [6] of this article. There is also derived bandwidth selection rule for kernel based 
density derivative estimation. 

The next Section [2] derives the univariate AMISE criteria and gives brief on the existing 
rules for data driven kernel bandwidth selectors. The Section [3] derives the ExROT. The 
performance analysis is done using two separate experiments in Section [4| The preliminary 
background on multivariate KDE, Kronecker product, multivariate Taylor Series and others 
is briefed in Section [5j A compact derivation for multivariate GGC Series and GCA Series 
is provided in Section |5.4| Then, the Section [6] derives multivariate AMISE in a vector 
form using Kronecker Product. The multivariate AMISE and multivariate GCA Series are 
used to derive ExROT for multivariate KDE in Section [7J Similarly, the Section [8] derives 
ExROT for density derivative estimation. Finally, the article is concluded in Section [9j 


2. Bandwidth Selection Criteria and Selectors 

Given N realizations of an unknown PDF f(x), the kernel density estimate f(x ) is given by 
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where, K{u) is the kernel function and h is the bandwidth parameter. Usually, K(u) is a 
symmetric, positive definite and bounded function; mostly a PDF; satisfying the following 
properties: 

/ oo roo roo 

K(u)du = 1, / uK(u)Du = 0, / u 2 K(u)du = ^(K) < oo 

-oo J —oo J —oo 

The accuracy of a PDF estimation can be quantified by the available distance measures 
between PDFs; like; Li norm based mean integrated absolute measure, L 2 norm based 
mean integrated square error (MISE), Kullback-Libeler divergence and others. The optimal 
smoothing parameter (the bandwidth) h is obtained by minimizing the selected distance 
measure. The most widely used criteria MISE or IMSE (Integrated Mean Square Error) 


based bandwidth selection rule, as in (Silverman, 1986 Wand and Jones, 1994), is briefed 
in the Appendix |A| It is given as under: 


MISE(/(x)) = E{ISE(f(x),f(x))} = E 


(/(*) - fi x )) 2 dx 
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where, ^{K) = f z 2 K 2 (z)dz, R(f") = / ( f"(x)) 2 dx and R{K) = f K 2 (z)dz. In general, 
R(g) = f g 2 {z)dz is identified as the roughness of function g(x). An asymptotic large sample 
approximation AMISE is obtained, assuming limjv->oo h = 0 and limv->oo Nh = oo i.e. h 
reduces to 0 at a rate slower than 1/N. 

AMISE (/(a;)) = ^{g 2 {K)) 2 R{f") + j^R(K) (3) 

The Equation ([3]) interprets that a small h increases estimation variance, whereas, a larger 
h increases estimation bias. An optimal h minimizing the total AMISE(/(x)) is given by, 

^hAMISE = ( MK )2%« )N y < 4 > 

Thus, the optimal bandwidth parameter depends upon some of the kernel parameters, 
number of samples and the second derivative of the actual PDF being estimated. 


2.1 Brief Survey on the bandwidth selectors 

As the bandwidth selection rules vary based on the choice of performance criteria for density 
estimation, they also vary based on the way the performance criteria is optimized. The 
various rules, satisfying AMISE, for bandwidth parameter selection differ in the way R(f") 
is estimated. The first group of rules named scale measures give rough estimate of the 
bandwidth parameter with less computation. It includes Silverman’s Rule-of- Thumb that 


estimates h assuming f{x) being Gaussian (Silverman 1986). For a Gaussian PDF R(f") = 

t -5 


and for a Gaussian kernel R(k) = Accordingly, 

h r „t = 1.0592 aN~ 1/5 


( 5 ) 


where, a is the standard deviation of f(x). There are many other rules based on the 
assumption of other parametric family. There are also rules based on oversmoothed h, 
difference based h and others briefed by (Janssen et al.[ 1995). 

An another group of rules is based on the more accurate at high computation plug-in 
rules. They plug-in the kernel based estimate of the R{f"). The direct plug-in rules estimate 
derivative of the density functionals instead of estimating actual derivatives. Every r th order 
derivative functional estimation requires (r+ 2) th order estimate and pilot bandwidth to start 
with. Assuming, some parametric density for the (r + 2) th order density the pilot bandwidth 
is selected and cumulatively the bandwidth parameter to estimate f(x) is obtained. The 
solve-the-equation plug-in rules use the same approach but, instead of assuming bandwidth 
parameter, they optimize it by directly putting it into the AMISE. This requires solving a 
non-linear equation iteratively. They have better performances at high computation. Other 
than the rules for bandwidth selection, there are also cross-validation methods selecting the 
best from a user-defined list of bandwidth parameter based on some performance criteria. 
But, there is always a compromise between a length of the list for possible bandwidth 
parameters and the amount of computation. 

Over all, a bandwidth selection rule that achieves precise bandwidth parameter at 
low computation is still open for research. 
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3. Extended Rule-of-Thumb (ExROT) Bandwidth Selector 


The Gaussianity assumption for an unknown PDF to estimate R(f") is too restrictive. 
Intuitively, better PDF estimations can be derived if R(f") is estimated by approximating 
PDFs through infinite series expansion. As a verification and demonstration to this concept, 
ExROT is derived in this section using cumulants based Gram-Charlier A Series expansion 
of f(x) based on near Gaussianity assumption for an unknown PDF f(x). The series is 
given as: 
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where, k r is the r th cumulant of f(x), is the r th cumulant of Gaussian PDF (G(x)) and 
D is the derivative operator with respect to x. With k\ = 71 , £'2 = 72 and taking derivative 
twice with respect to x yields: 
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Also, the nth derivative of Gaussian is given by 
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where, G(x) = exp (^) 2 


is the Gaussian PDF and H r is the nth order Hermite 


polynomial. Accordingly, approximating upto fourth order cumulant 
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The integration is obtained using the following rules: 
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for n odd, as the function becomes odd 
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where, (n — 1)!! = (n— l)(n — 3)(n — 5).... Accordingly, the quantities in Equation ([d]) are 
obtained as under: 


This gives: 
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Combining above Equation ([7]) with equation Q, the Gram-Charlier A Series based an 
Extended Rule-of-Thumb for bandwidth parameter hoc selection using near Gaussian PDF 
assumption and Gaussian kernel is given as under. As shown, the Silverman’s Rule-of- 
Thumb is one case of the extended rule. 


h cum = 1.0592u(C!V)-5 
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where, C = 


1 + 0.3764^ + 0.7292^ 


both k 3 = K 4 = 0 i.e. Gaussian PDF 
if k 3 = 0 i.e. symmetric PDF, 

1 + 1.09387’3 + 0.3764/c! + 0.7292 k 4 if cr = 1, 

1 + 1.09384 + 0.37644 + 0.7292^i otherwise 


h 2 

& 

.2 


^3 

2 


4. Performance Analysis of ExROT Bandwidth Selector 

There has been performed two separate experiments to test the performance of bandwidth 
selector. In both the experiments, the performance is tested on a set of 15 densities selected 


as a test-bed for density estimators by J. S. Marron (1992). The densities are shown in 


Figure [l] In both the experiments, the performance of ExROT is compared against Sil¬ 
verman’s Rule-of-Thumb and the e-exact approximation algorithm based solve-the-equation 


plug-in rule (Raykar and Duraiswami, 2005, 2006) 
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Figure 1: The Probability density functions (PDFs), generated through Normal mixtures, 
used to have performance comparison of various bandwidth selection rules for Ker¬ 
nel Density Estimation (KDE): (1) Gaussian (2) Skewed Unimodal (3) Strongly 
Skewed (4) Kurtotic Unimodal (5) Outlier (6) Bimodal (7) Separated Bimodal 
(8) Skewed Bimodal (9) Trimodal (10) Claw (11) Double Claw (12) Asymmetric 
Claw (13) asymmetric Double Claw (14) Smooth Comb (15) Discrete Comb 


4.1 Experiment 1 (Performance against varying PDFs) 

The first experiment is done to test the performance against varying PDFs. The experiment 
is done with 50000 samples and for 100 trials. The results are shown in Table [l] Each table 
entry is an average of the 100 trials. The selected three rules are compared for performances 
against three parameters - the value of bandwidth estimated, the corresponding IMSE 
between the estimated PDF and the theoretical PDF and the time taken in Seconds to 
estimate the bandwidth. The theoretical PDF, required to calculate IMSE, is obtained 
from the normal mixture equations. That is why, all the 15 selected densities are generated 
using normal mixture equations. 

The e-exact solve-the-equation plug-in rule is the best giving minimum IMSE error 
in all the cases accept the pure Gaussian density estimation. For Gaussian density, ROT 
is slightly better than remaining both the rules. The best IMSE performance of e-exact 
solve-the-equation plug-in rule is at the cost of very high computation time. The mean time 
to estimate the bandwidth parameter for ROT is less than one millisecond. For ExROT 
it is about 10 to 20 milliseconds and the same for e-exact solve-the-equation plug-in rule is 
about 30 to 60 seconds. That means, the ExROT has time complexity comparable to that 
of the ROT. So, the IMSE performance of these two needs a comparision. The boldface 


7 




















values for IMSE comparison in Table [lj show a better between these two. It shows that in 
all non-Gaussian unimodal density estimation cases - skewed or kurtotic or with outlier - 
ExROT has outperformed ROT. The worst performance of ExROT in multimodal density 
estimation is due to wrong estimation of the skewness and kurtosis. The ExROT has 
also outperformed ROT in some of the - claw and Asymmetric claw - multimodal density 
estimation cases. Thus, ExROT surely is a better option to ROT in unimodal density 
estimation. 


Table 1: Performance comparison of the bandwidth selection rules for Kernel Density Esti¬ 
mation (KDE) using 50000 samples. The results show the mean of the 100 trials. 
The boldface IMSE value show a better between the Rule-of-Thumb (IMSErot) 
and extended Rule-of-Thumb (IMSEexrot) rules. The time calculation is on a ma¬ 
chine with features: Intel(R) Core(TM)2 Duo CPU, 2.93 GHz, 4.00 GB Internal 
RAM, 32 bit Windows 7 Professional, MATLAB R2010a 


PDF 

Bandwidth (h) 

Integrated MSE (IMSERlO 13 

Estimation Time l 

in Sec) 

Type 

hrot 

hexrot 

heqfast 

rot 

exrot 

eqfast 

Trot 

Texrot 

Teqfast 

1 

0.1217 

0.1216 

0.1213 

0.1424 

0.1424 

0.1426 

0.0005 

0.0171 

30.9182 

2 

0.0993 

0.0860 

0.0818 

0.1770 

0.1702 

0.1702 

0.0005 

0.0157 

35.9430 

3 

0.1263 

0.0964 

0.0200 

3.7034 

2.8516 

0.4061 

0.0006 

0.0155 

52.1223 

4 

0.0996 

0.0693 

0.0204 

3.0476 

1.8294 

0.3834 

0.0008 

0.0154 

48.2712 

5 

0.0402 

0.0044 

0.0129 

1.9249 

0.7042 

0.4425 

0.0006 

0.0148 

41.9652 

6 

0.1464 

0.1632 

0.0967 

0.1908 

0.2220 

0.1504 

0.0006 

0.0149 

41.0530 

7 

0.1999 

0.2065 

0.0925 

0.3681 

0.3897 

0.1640 

0.0006 

0.0147 

40.6588 

8 

0.1333 

0.1596 

0.0736 

0.3122 

0.4117 

0.1920 

0.0006 

0.0153 

41.7342 

9 

0.1552 

0.1700 

0.0785 

0.3438 

0.3954 

0.1781 

0.0005 

0.0145 

41.1552 

10 

0.1058 

0.1042 

0.0242 

2.3709 

2.3269 

0.3392 

0.0005 

0.0149 

48.2730 

11 

0.1459 

0.1629 

0.0895 

0.7810 

0.7961 

0.7357 

0.0005 

0.0146 

41.7426 

12 

0.1356 

0.1350 

0.0323 

1.6634 

1.6586 

0.5287 

0.0005 

0.0156 

54.3546 

13 

0.1450 

0.1652 

0.0461 

1.1161 

1.1926 

0.5791 

0.0005 

0.0147 

47.9025 

14 

0.2001 

0.2058 

0.0280 

3.2280 

3.2763 

0.8810 

0.0005 

0.0140 

51.0985 

15 

0.2059 

0.2111 

0.0230 

3.2382 

3.2824 

0.4599 

0.0006 

0.0145 

50.8321 


4.2 Experiment 2 (Performance against varying number of samples) 

The second experiment is done to have the performance comparision of the same selected 
three bandwidth estimators against varying number of samples. The results of estimated 
bandwidth parameter, the IMSE and the estimation time (in Seconds) versus number of 
samples for varying PDFs are tabulated in Table [2] For better interpretations, the IMSE 
versus log of the number of samples are plotted; for all 15 PDFs and number of samples 
varying from 100 to the 50000; as shown in Figure [2] 

The IMSE performances against varying number of samples (Nsamples) for varying 
PDFs are similar to that discussed in Experiment 1. The ExROT performance is better 
for unimodal skewed, kurtotic or with outlier densities. Even it is better in some cases of 
multimodal skewed densities. Also, for small number of samples the ExROT performance 
is better compare to ROT. The convergence performance of ExROT is matching other two 
rules. The IMSE decreases almost inversely with increase in number of samples. 









Table 2: Performance comparison of the bandwidth selection rules for Kernel Density Es¬ 
timation (KDE) against varying number of samples. The results show the mean 
of the 50 trials. The time calculation is on a machine with features: Intel(R) 
Core(TM)2 Duo CPU, 2.93 GHz, 4.00 GB Internal RAM, 32 bit Windows 7 Pro¬ 
fessional, MATLAB R2010a 


PDF 

Nsamples 

Bandwidth Parameter ( h ) 

Integrated MSE ( IMSERlO 3 

Estimation Time ( 

in Sec ) 

Type 

* 10“3 

hrot 

hexrot 

heqfast 

rot 

exrot 

eqfast 

Trot 

Texrot 

Teqfast 


0.0100 

0.4189 

0.4379 

0.4059 

6.2430 

6.5102 

6.5579 

0.0011 

0.0004 

0.0968 


0.0200 

0.3613 

0.3797 

0.3573 

2.6582 

2.6653 

2.7344 

0.0001 

0.0002 

0.1210 


0.0500 

0.3038 

0.3099 

0.3008 

0.7831 

0.7855 

0.8064 

0.0001 

0.0004 

0.3287 


0.1000 

0.2659 

0.2681 

0.2602 

0.3148 

0.3176 

0.3239 

0.0001 

0.0006 

0.6734 

1 

0.2000 

0.2324 

0.2337 

0.2287 

0.1253 

0.1262 

0.1274 

0.0001 

0.0010 

1.3470 


0.5000 

0.1929 

0.1931 

0.1911 

0.0346 

0.0346 

0.0349 

0.0001 

0.0024 

3.2634 


1.0000 

0.1680 

0.1678 

0.1672 

0.0127 

0.0128 

0.0128 

0.0002 

0.0038 

6.3346 


2.0000 

0.1464 

0.1466 

0.1455 

0.0050 

0.0050 

0.0050 

0.0002 

0.0073 

12.8339 


0.0100 

0.3454 

0.3723 

0.2944 

8.9172 

10.3399 

9.2006 

0.0001 

0.0002 

0.0720 


0.0200 

0.2993 

0.3130 

0.2533 

3.4302 

3.8549 

3.4604 

0.0001 

0.0002 

0.1420 


0.0500 

0.2465 

0.2339 

0.2026 

0.9394 

0.9924 

0.9915 

0.0001 

0.0004 

0.3852 


0.1000 

0.2165 

0.1870 

0.1819 

0.3954 

0.3974 

0.3945 

0.0001 

0.0007 

0.7551 

2 

0.2000 

0.1893 

0.1663 

0.1565 

0.1399 

0.1374 

0.1381 

0.0001 

0.0010 

1.5336 


0.5000 

0.1571 

0.1367 

0.1296 

0.0425 

0.0422 

0.0422 

0.0001 

0.0023 

3.7511 


1.0000 

0.1370 

0.1198 

0.1134 

0.0168 

0.0162 

0.0161 

0.0002 

0.0036 

7.2034 


2.0000 

0.1194 

0.1038 

0.0986 

0.0066 

0.0064 

0.0064 

0.0003 

0.0065 

14.2614 


0.0100 

0.4429 

0.3498 

0.1650 

40.3373 

36.7614 

25.9410 

0.0001 

0.0002 

0.0864 


0.0200 

0.3846 

0.2996 

0.1188 

18.8112 

16.7646 

9.6509 

0.0001 

0.0002 

0.1738 


0.0500 

0.3183 

0.2459 

0.0844 

6.8916 

6.0170 

2.8246 

0.0001 

0.0004 

0.4436 


0.1000 

0.2775 

0.2125 

0.0653 

3.2038 

2.7464 

1.0352 

0.0001 

0.0006 

0.9069 

3 

0.2000 

0.2410 

0.1845 

0.0510 

1.4770 

1.2452 

0.3778 

0.0001 

0.0010 

1.9033 


0.5000 

0.2003 

0.1532 

0.0384 

0.5282 

0.4362 

0.1080 

0.0001 

0.0021 

4.9425 


1.0000 

0.1744 

0.1332 

0.0310 

0.2397 

0.1939 

0.0377 

0.0002 

0.0032 

10.3360 


2.0000 

0.1518 

0.1158 

0.0257 

0.1080 

0.0856 

0.0151 

0.0003 

0.0058 

21.1275 


0.0100 

0.3500 

0.2631 

0.1247 

41.5059 

36.6378 

23.6608 

0.0001 

0.0002 

0.1076 


0.0200 

0.2999 

0.2169 

0.0934 

19.3056 

16.1740 

8.6166 

0.0001 

0.0002 

0.2175 


0.0500 

0.2517 

0.1805 

0.0686 

7.0407 

5.6630 

2.3757 

0.0001 

0.0003 

0.5464 


0.1000 

0.2167 

0.1509 

0.0557 

3.2218 

2.4513 

0.9320 

0.0001 

0.0006 

1.0900 
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0.2000 

0.1899 

0.1333 

0.0449 

1.4626 

1.0710 

0.3232 

0.0001 

0.0010 

2.1309 


0.5000 

0.1582 

0.1102 

0.0350 

0.5031 

0.3439 

0.0875 

0.0001 

0.0022 

4.8694 


1.0000 

0.1375 

0.0957 

0.0296 

0.2205 

0.1448 

0.0344 

0.0002 

0.0037 

9.6654 


2.0000 

0.1197 

0.0833 

0.0252 

0.0951 

0.0600 

0.0136 

0.0002 

0.0058 

19.3117 


0.0100 

0.1329 

0.0158 

0.0511 

56.5031 

41.3602 

21.4029 

0.0003 

0.0003 

0.1138 


0.0200 

0.1215 

0.0145 

0.0429 

25.6093 

14.6056 

8.5744 

0.0001 

0.0002 

0.2072 


0.0500 

0.0992 

0.0110 

0.0352 

8.1567 

4.2720 

2.5379 

0.0001 

0.0003 

0.4943 


0.1000 

0.0881 

0.0100 

0.0297 

3.4558 

1.6044 

0.9628 

0.0001 

0.0006 

0.9657 
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0.0257 
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0.0001 

0.0010 

1.7802 
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0.0072 

0.0209 
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0.1696 
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4.2263 


1.0000 

0.0556 

0.0062 

0.0181 

0.1705 

0.0652 

0.0397 

0.0002 

0.0028 

8.4271 


2.0000 

0.0483 

0.0053 

0.0156 

0.0675 

0.0244 

0.0156 

0.0002 

0.0056 

16.8823 
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Table 3: Performance comparison of the smoothing (bandwidth) parameter selection rules 
for Kernel Density Estimation (KDE) against varying number of samples. The 
results show the mean of the 50 trials. The time calculation is on a machine with 
features: Intel(R) Core(TM)2 Duo CPU, 2.93 GHz, 4.00 GB Internal RAM, 32 
bit Windows 7 Professional, MATLAB R2010a 


PDF 

Nsamples 

j Bandwidth Parameter ( h ) 

Integrated MSE ( IMSE )* 10 d | 

| h Calculation Time ( T ) j 

Type 

CO 

1 

o 

* 

hrot 

hexrot 

heqfast 

rot 

exrot 

eqfast 

Trot 

Texrot 

Teqfast 


0.0100 

0.5128 
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0.0002 

0.1431 
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0.1614 
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0.0010 
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0.5000 
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0.1225 

0.0674 

0.0856 
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0.0001 
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4.1356 
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0.1841 
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0.1045 

0.0268 

0.0345 

0.0173 

0.0002 

0.0030 

8.2747 
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0.1602 

0.1917 

0.0896 

0.0106 

0.0139 

0.0068 

0.0002 

0.0054 

16.6340 
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0.5406 

0.5921 

0.3817 

9.4215 

10.1084 

7.9126 

0.0003 

0.0002 

0.0782 


0.0200 

0.4662 

0.5133 

0.3161 

4.0153 

4.3435 

3.4017 

0.0001 

0.0002 

0.1410 


0.0500 

0.3881 
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0.2522 

1.3227 

1.4426 

1.0262 
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0.0003 

0.3687 
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0.3397 

0.3718 
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4.1529 
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0.2143 
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0.0293 
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0.0002 
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8.1136 


2.0000 

0.1865 

0.2042 

0.0979 

0.0119 

0.0135 

0.0066 

0.0002 

0.0055 

16.2836 


0.0100 

0.3648 

0.3635 

0.3384 
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0.0631 
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10.9313 
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0.1727 
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0.9879 

0.3128 
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0.3624 
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0.1454 

0.0359 
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0.0002 

0.0028 

10.9701 


2.0000 

0.1271 

0.1254 

0.0303 

0.0728 

0.0719 

0.0124 

0.0002 

0.0055 

21.3417 
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Table 4: Performance comparison of the smoothing (bandwidth) parameter selection rules 
for Kernel Density Estimation (KDE) against varying number of samples. The 
results show the mean of the 50 trials. The time calculation is on a machine with 
features: Intel(R) Core(TM)2 Duo CPU, 2.93 GHz, 4.00 GB Internal RAM, 32 
bit Windows 7 Professional, MATLAB R2010a 
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0.5102 
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0.0010 

1.5428 


0.5000 

0.2311 

0.2585 

0.1560 
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0.0002 
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0.0193 

0.0003 

0.0055 

16.4165 
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0.4851 
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Figure 2: The Integrated Mean Square Error (IMSE) comparision of the bandwidth selec¬ 
tion rules for Kernel Density Estimation (KDE) of various PDFs against varying 
number of samples. The solid lines (-) indicate Rule-of-Thumb] dashed lines (- 
-) indicate extended Rule-of-Thumb] the dash-dot lines (-.) indicate the e-exact 
solve-the-equation plug-in rule 


5. Prerequisites for Multivariate ExROT 

The derivation for multivariate ExROT requires expressions for multivariate KDE, mul¬ 
tivariate Taylor Series, multivariate cumulants, multivariate Hermite polynomials, multi¬ 
variate GCA Series and multivariate AMISE for bandwidth selection in KDE. Due to the 
reasons discussed in the Section [lj the article uses the expressions in vector notations. The 
required pre-requisites are reported in this section and the multivariate AMISE is derived 
in the next section. 

5.1 KDE for Multivariate PDF 

The nonparametric estimation of a multivariate PDF requires a multivariate kernel. A 
multivariate kernel can be derived in two ways. The most popular technique is to use a 
product kernel i.e. a d-dimensional kernel is achieved through the product of 1-dimensional 
kernels in each direction. The product kernel may have unequal spread or bandwidth in 
each direction. The other way is to select a spherically symmetric multidimensional PDF 
as the kernel, called spherical kernel. 

Let x = (xi,X2,...,Xd) be a d-dimensional random vector. Let /C(u) be the d- 
dimensional kernel. Then, the multivariate PDF of x with available N realizations can be 
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given as under: 


N 


N 


= N^ ( H_1(x - Xj) ) = N E ^h( x - *0 


^ det(H) 
2=1 v 7 


N 


2=1 


(9) 


where, H is the bandwidth matrix, det(-) implies the determinant of a matrix and the 
symbol /Ch(-) = dct( H ) ^ (H _1 (-)). I n a correlated data, H is made proportional to the 
covariance matrix C x to compensate for the mutual correlations. Usually, the kernel /C( u) is 
a multiplicative kernel given by /C(u) = flf=i £(«*)■ Also, it is symmetric, positive definite, 
zero mean and bounded function; usually a PDF. 

Let T be the class of all symmetric, positive definite d x d matrices. Then, H £ M. 
A simplification can be achieved, if H £ P, where V C Ad is a set of d x d diagonal 
matrices. More precisely, H = diag(/ri, h 2 , ■. ., h d ). The multivariate PDF x with available 
N realizations is now given as under: 


1 N 1 


N hih 2 ...h d 
2=1 

i 

*=i l j=i 


/c 


*^1 *^2l %id 

) * * 1 5 


/ii 


h d 


( 10 ) 

( 11 ) 


Further simplification is achieved if H £ 5, where 5 C V is a set set of matrices with 
constant diagonal. More precisely, H = hid i.e. equal bandwidth in all directions. 


5.2 Taylor Series of a multivariate function 

Let x = (Xi, X 2 ,..., Xd)' be a d-dimensional column vector and /(x) be the function of 
several variables differentiable in each variable. The definition and application of Kronecker 

product on vector derivative operator D x = ^g|^, ■ ■ ■ , g|^J are briefed in Appendix 

[B| Using them, the Taylor Series in vector notations for /(x) by expanding it at origin, is 
given as under: 

m=o o 1 

/(x)= £ -jk(rn,d) , x® m (12) 

Z — J ml 

m =0 

where, k(m, d) is the vector of dimension d m x 1 and given in terms of the derivative vector 
D x as 

k (m,d) = (D| m /(x))| x=0 

5.3 Moments, Cumulants and Hermite Polynomials 

Let x = (X 1 ,X 2 , ...,Xd)' be a d-dimensional random vector, /(x) be its joint PDF differ¬ 
entiable in each variable. Also, let for A = (Ai, A 2 ,..., X d )', A £ and /(x) - J~( A) be the 
Characteristic function, M(A) be the Moment Generating Function (MGF), C(A) be the 
other form of Characteristic function through natural log of -F(A), C(A) be the Cumulant 
Generating Function (CGF). Then, assuming M(A) and J-{ A) have been expanded using 
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Taylor Series, 


/ OO 

x® fc /(x)dx = Df M(A) a=0 = (-i)*D®* J- X (A) 

-oo 


A=0 


(13) 


Assuming C(A) and C(A) have been expanded using Taylor Series, 

c (fc, d) = Df C(A)| a=0 = (-i) fc Df C x (A)| a=0 (14) 

where, C(A) = lnM(A) and C(A) = InJ 7 ^ A). The details on the moments, cumulants and 


their relationships can be found in (Bhaveshkumar 2015a). 


The multivariate Hermite polynomials are defined as under in Equation (15). 


H fc (x; 0, C x ) = [G(x; 0, C*)]-^-!)* (C X D x f k G(x; 0, C x ) (15) 

where, G(x;0,C x ) = jCxp 1 / 2 (27r) _d / 2 exp (x / C x 1 x) = (Cxp 1 ^ 2 (27r)~ d / 2 exp ^(vecCf) 7 x 1 ^ 

(16) 


The d-variate Hermite polynomials, derived by Terdik (2002, Proposition 6), are given by 
the following recursion rule: 


H 0 (x) = l;Hi(x) = x 
For n > 1 

H n (x(i :n) ) = H n _i(x(l : n - 1)) <g) x 


n— 1 


1,2) (^( 1:n )) X C ™j(2,d) 

3 =1 


(17) 


H n _ 2 (x ( i y -_i ii+ i :n _i ) ) (18) 


where, the permutation V{n,j ) —y (1,2) is putting the n th element to the first and j th to 
the second place, keeping the other elements unchanged. More precisely, 


-l 


1 ,2) (^(l :n )) — (K-Vj+l—>2{d n , d(l :n _l))K-p n _^i {d n , d(l :7l _l))) 

= {dn > ^(l:n—1))^T 7 j-|-1—>-2 {dn > ^(l:n—1)) 

5.4 A Compact Derivation for Multivariate GGC Series and GCA Series 


A full derivation for multivariate GGC and corresponding GCA can be found in Bhaveshku¬ 


mar (2015a). This is a compact derivation for Generalized Gram-Charlier Series. The de¬ 


tailed proof can be found in the article. Let J- x be the characteristic function of an unknown 
multivariate PDF /(x). Let ’L(x) be the characteristic function of a reference multivariate 
PDF V’(x). Also, let c(k,d) be the k th cumulant of d-dimensional /(x) and c r {k,d) be 
the k th cumulant of d-dimensional ^(x). The difference of the cumulant vector is given by 
S{k,d ) = c{k,d) — c r (k,d). Instead of the detailed proof, a compact derivation of GGC 
follows here. 
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Let A = (Ai, A' 2 ;,’..., A d) T , A £ M d ; 0 be the Kronecker product operator and F i 
the Fourier transform operator. Then, 


^x(A) = exp 


= exp 


V' n, ^( iA ) 

2^c (fc,d) 

,k= 1 
oo 

E^M) *, 

,fc=l 

2^a(k,d) 


k\ 

(i\)® k 


( V definition 


exp 


^2c r (k,d) 


(iA) 




, fc=i 


lb! 


Lfc=o 


k\ 


F(^(x)) 


Taking inverse Fourier transform of the above equation brings 

00 /_ 1 \k 

/(x) = d) (x) * V’C 


k=0 

00 


fc! 

<-i)‘ 


/(x) = E“( 


k =0 


where, * indicates convolution. Thus, the GGC is obtained in a compact way. 
The a(k,d) can be derived using its equality in terms of S(k,d) as: 


^ 2 <x(k,d) 


(<A) 




k =0 


L! 


= exp 


OO 


^2d(k,d) 

L/c=i 


(iA) { 


k\ 


This results into the following relations: 


(19) 

( 20 ) 
( 21 ) 

( 22 ) 

(23) 


a( 0 , d) =1 

a(l, d) =<5(1, d) 

a(2, d) =<5(2, d) +<5(1, d)® 2 

a(3, d) =<5(3, d) + 3<5(2, d) 0 <5(1, d) + <5(1, d)® 3 

a(4, d) =<5(4, d) + 4<5(3, d) 0 <5(1, d) + 35(2, d ) 02 + 65(2, d) 0 5(1, d ) 02 + 5(1, d ) 04 
a(5, d) =5(5, d) + 55(4, d) 0 5(1, d) + 105(3, d) 0 5(2, d) + 105(3, d) 0 5(1, d ) 02 
+ 155(2, d )® 2 0 5(1, d) + 105(2, d) 0 5(1, d ) 03 + 5(1, d )® 5 
a( 6 , d) =5(6, d) + 65(5, d) 0 5(1, d) + 155(4, d) 0 5(2, d) + 155(4, d) 0 5(1, d ) 02 + 105(3, d ) 02 
+ 605(3, d) 0 5(2, d) 0 5(1, d) + 205(3, d) 0 5(1, d )® 2 + 155(2, d )® 3 
+ 455(2, d )® 2 0 5(1, d )® 2 + 155(2, d) 0 5(1, d ) 04 + 5(1, d )® 6 


The multivariate GGC, for specific Gaussian PDF reference and matching first and second 
order moments, can be obtained by taking 5(1, d) = 0, 5(2, d) = 0 and c r (k,d) = 0. The 
GGC with respect to a Gaussian PDF is identified as GCA Series, given as under: 


/(x) = G(x) - C( ! ! d) V)(x) + ^G< 4 >(: 
+ C ( 6 ,dy + ioc(3,d) 02/ cf6)(::) + _ 


x - 


c(5, d)' 
5! 


G (5 )( 


x 


(24) 
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6. AMISE in vector notations for bandwidth selection multivariate KDE 


The derivation for bandwidth parameter selection, satisfying AMISE between the estimated 
multivariate PDF and the actual multivariate PDF, can be found in (Wand and Jones, 1994). 
For the reasons described before, it is re-derived here in terms of vector notations using 

Kronecker product and differential operator as a column vector D x = ^g|^-, gfg, ■ ■ ■ , g|^ . 


MISE (/(x),/(x)) 


Bias 2 (/(x))dx + 



Var (/(x))dx 


(25) 


The bias and variance estimations, 
derived as under. 


using the Taylor Series expansion in Equation (12), are 


N 


£{/( x )} = E {£h( x - X*)} 

2=1 

/ OO 

/C H (u-x)/(u)du 

-OO 

/ OO 

/C(z)/(x + Hz)dz ( substituting z = u — x ) 

-OO 

= J°° /C(z) (/(x) + k(l, d)'(Hz) + ^k(2, d)'( Hz )® 2 + O(tr(H 02 ))^ dz 


where, k(i, d) = Df l /( x ) | x=x 

Assuming the kernel with symmetric and bounded PDF the following properties are satis¬ 
fied: 


/ OO 

u/C(u)du = 0 

-OO 

/ OO 

u (8) u/C(u)du = m/c(2, d) = /i 2 (/C)vec(I( dxd )) 

-OO 

= M 2 (/C) 5 2 where, S 2 = vec(I (dxd )) 


(26) 

(27) 

(28) 


Here, m^(2, d) is the second order moment vector of the d-variate kernel with d? components 
and S 2 is a vector of size (d 2 x 1 ) with only selected d values being 1 . The second order 
moment of each component is constant with value ji 2 (/C) and all cross-moments are zero. 
Using these properties, the bias can be written as: 


=>■ Bias(/(x)) = E{f(x)} - /(x) « ^k(2, d)' (H 02 m,c(2, d)) (29) 
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Similarly, the variance of the estimation can be approximated as under: 


/ 1 N 

Var(/(x)) = Var I — ^/C H ( 


X - X,; 


1=1 


1 


r oo -I 

^2/i 1 z?2 


JVdet(H) _/„« 

i r°° 

TVdet(H) J 


/C 2 (*)/(x + Hz)dz--^ 


{/(X)} 


K?(z) (/(x) + k(l, <£)'Hz + k(2, d)'(Hz) 02 + O(tr(H 02 ))) dz 


Var(/(x)) 


1 


JVdet(H) 


/(x) / /C 2 (z)dz (•_■ assuming large IV, small h ) 


(30) 


Combining equations (25), (29) and (30); we get: 

MISE(/(x)) = \J^ (k(2, d)'H® 2 m^(2, d)f dz + + O (tr (H 04 )) + O ( 


f det(H)\ 
) 


where, R{K ,) = j ( ^ 1 oo K?{z)dz. An asymptotic large sample approximation AMISE is ob¬ 
tained, assuming liin/v-^oo det(H) = 0 and lirri/v->oc IVdet(H) = oo i.e. det(H) reduces to 0 
at a rate slower than 1/N. 

i r 00 o i 

AMISE (/(x)) = -j (k(2, d)' H 02 m^(2, d)f dz + ^ fl(/C) 

1 /*oo 1 

= 4 ^(/C) J ^ (k( 2 , d)' (H 02 <5 2 )) 2 dz + (31) 

Now, (k(2, d)' (H 02 5 2 )) 2 = ((k(2,d) o S 2 )' (H 02 <5 2 )) 2 

= ^(H i 2 5 2 ) 02 ^ (k(2,d) o 6 2 ) 02 ('.• x'Ax = (vecA)'x 02 ) 

=* AMISE(/(x)) = ^/i 2 (/C) (H 04 5 02 )'R(/"(x)) + Nd ^ {u) m) (32) 

/ OO 

(k( 2 , d) o S 2) 02 dz 

-OO 

The Equation ( |32| ) interprets that a small det(H) increases estimation variance, whereas, 
a larger det(H) increases estimation bias. Also, R(/ , 7 (x)) is the roughness vector of / / 7 (x) 
with dimension d 4 X 1 and the corresponding total roughness is given by, 


d 4 


d 4 r 


Wx)) = E[R(/ ,, (x)]i = E 


i= 1 


(k(2, d) o <5 2 ) 02 dz 


E 

\i =1 


g 2 /(x) 

0 X 2 


2=1 L 
2 


(33) 


(•.■ 82 has only d number of ones.) 


r°° r 

/ (V 2 /(x))' 

J— 00 L 


dx 
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where, the indicator function <52 assures that only few partial derivatives are non-zero. 

To simplify the bandwidth estimation, let us assume H € T> and T> is the set of all 
diagonal matrices. That is, H = diag(/ii, fo,..., hd) = diag(h). Corresponding to that, 

H 02 <52 = h 02 o ^2 = 2 (symbolically) 


where, o denotes hadamard product. Based on this simplification, the AMISE is given by, 


AMISE(/(x)) = (h 04 o 5 02 )'R(/"(x)) + ] R(K) 


NUU ^ 


(34) 


Taking partial derivative of AMISE(/(x)), in Equation (34), with respect hi and comparing 
it to zero gives the optimal bandwidth parameter hi. The AMISE required is given as under: 

d 


^(/W) - 3 [R(r(x)] , + 2fe ( E h 2 o ( [R(/"(x) )]ri + [H(/"(x) )]r2 ) 

R(K) = 0; 




NhilYlUhj 


(hAMISE)i = 


R{K) 




( R(JC) \ d+i TT d 

= ( „| ()c)[R(/ „ (x)liA , J (••• Approximating det(H) = [[ % = h t ) 


R(JC) 


(35) 


To simplify further the bandwidth estimation, let us assume H?5, where S CP 
with constant diagonal. Accordingly, let H = hid he. same bandwidth in all directions. 
With this condition the AMISE can be given as: 

AMISE(/(x)) = jl4(K) (I® 4 df )'R(/"(x)) + J^m) 

= ^(K)B(r(x)) + -E B( k:) (36) 


Taking derivative of AMISE (/(x)) in Equation (36) with respect to h and comparing it to 
zero gives the optimal bandwidth parameter minimizing the AMISE. It is: 


^miSE(Hx)) = h^aofRU") - 


R{K) = 0 


hAMISE 


_ f R(k) 


W 2 {K)R(f")N 


d +4 


(37) 


Thus, the optimal bandwidth parameter depends upon some of the kernel parameters, 
number of samples and the second order derivative of the actual PDF being estimated. 
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6.1 Rule-of-Thumb for multivariate KDE 

Let us apply the Rule-of-Thumb for bandwidth estimation i.e. estimate the bandwidth 
assuming the unknown PDF as a multivariate Gaussian. For a multiplicative Gaussian 
kernel, = 1, R{IC) = 2 ~ d -K~ d / 2 and 


*(/"(*)) = 


„ , , d +4 

(d + 2) | C x | 2 


2 (d+2) n d/2 


So, the bandwidth parameter is obtained as 


hROT = 


i 

4+d 


a 


(2 + d)N / 

where, a is the standard deviation, assumed same in all directions. 


(38) 


7. Extended Rule-of-Thumb for multivariate PDF 

The ExROT for bandwidth selection in multivariate KDE requires multivariate PDF ap¬ 
proximation using infinite series. To estimate R(k(2,d)) let us first obtain k(2,d). To 
obtain the ExROT expressions in vector notation, let us use vector derivative operator with 


Kronecker product applied twice to the Equation (23) for GGC Series. 


k(2, d) = Df/(x) = 


k =o 


kl 


(39) 


Taking Gaussian PDF as a reference PDF; i.e. t/;(x) = G*(x); i n Equation (39) or directly 


taking twice differentiation of Equation (24) for GCA Series and Equation (18) for kth order 
vector Hermite polynomial H/IM R(/"(x)) in Equation (33) is derived as under: 


k(2, d) = D® 2 /(x) = 


k =o 


k\ 


i \fc ( C (^’> d) <g> id 2 )' u 

/ y ( 1) j, j ('- > x ) Dfc_)_2(x, Gj 


k =0 


(40) 

(41) 


(27r) |C X | 1,/2 exp ^ (vecCx 1 ) 7 x <g) x 


(i® w 


(C- 1 )® 2 H 2 (x) 


(c(3,<i) ® w (c-1)® 5 H 5 (x) + (CP) 06 H 8 (X) 


3! 


d 4 r 


«(/"(*))« £ [|“ o 2 (x) | (c; 1 )® 2 h 2 (x) - 

, ( c (4, d) ® I d 2) / 


(c(3, d) (g> i(py 


3! 


4! 


(C x ! ) H 6 (x) o * 


;c- 1 )® 5 h 5 (x ) 

(42) 


1. The symbol H*, for kth order Hermite polynomial need not be confused with the notation H for band ¬ 

width matrix . 
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The following simplification using vector algebra helps solving above Equation (42). 


ab 0 cd = (a <g> c) (b <S> d), Also a' (g) b' = (a (g) b) 7 
((c(M) ® W ( C - 1 )® fc+2 H fc+2 (x)) 02 = ((c(M) ® I^)® 2 )' ((C^ 1 ) 
((1 ® I d2 y (c x f 2 H 2 (x)) ® ((c(4, d) ® Irf.)' (C^ 1 )® 6 H 6 (x)) 

= ((1 ® IeP) ® (c(4, d) <g> I*))' (c x ! )® 8 (H 2 (x) ® H 6 (x)) 


-1\ ®fc+2 


Hfc +2 (x 


i®i,p)'(c;‘) 02 h 2 (x)) of 


= [H,(x)of 1 ]®' = [(C;') , 2 H i (x)]> 


The integration in equation (42) is obtained using the following rules: 


/ OO POO / ^ 

exp (-x'Ax) dx = / exp (—vec (A) 7 x <8> x) dx = —— 

-oo J— OO V l-^-l 


d/2 


D 59 T = _ 

vec(A) vec(A) l ui 


7T 


d/2 


/•oo r/ / 7T \ 

J x i; 2 exp (—vec (A) 7 x <g> x) dx = - J (vecA) 


(- 1 ) 


a ian t — r) Q9n _ 

Also, u veC ( A )J — 1J vec(A) l 1^1 


7T 


d/2 


f 


X 


eX p(-v e c(A)' X ® X )<i X = ' )' ,/ 2 (veC A)®(-”) 


In general, 



vec (C^. 1 ) 7 x ® xj 


dx 


| (d+ w 2))!! (^h) J/2 ( c ( 2 ^)) 0n/2 


for n even 


for n odd 
(43) 


where, (d + n)!! = (d + n)(d + n — 2)(d + n — 4)... d( or d — 1). The integration rule in 
Equation 43 depend upon the second order vector cumulant and the length of the random 
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vector d. The necessary integrals are obtained as under. 


tj = f exp (vec(C x 1 ) / x <8) x) [H2(x)] 02 dx ^ 
J — OO 

/ oo 

exp (vec(C~ 1 ) / x <8 x) [H 5 (: 

-OO 


7r 


|C 


-ll 

X 


d/2 


c(2,d) 5 


(x)r dx 


(d + 8)!! , 1on (d + 4)H , ooc d nn (d + 6)!! onn (d + 2)!! 

2 5 + 130 2 3 + 225 2 ~ 20 2^- 300 2 2 


7r 


t3 = 


exp (vec(C x 1 ) , x <8 x) [Hs(x)] (8)2 dx = 


(d + 10)!! 
26 


+ 225 


+ 2025+±^+225-30+±^- 1380+±i+-135ol 

/ OO 

exp (vec(C x 1 ) / x <8) x) [Hi(x) <8 Hs(x)] dx = 0 

-OO 

/ OO 

exp (vec(C x 1 )'x <8 x) [H 5 (x) <8 H 6 (x)] dx = 0 

-OO 

/ OO 

exp (vec(C x 1 )'x <8 x) [H+(x) (8 Hg(x)] dx 

-OO 


Cx 1 ) 

(d + 6)!! 
2 4 

7T 


d/2 


c(2,d)« 


i-ii 


d/2 


c(2,d) s 


(d + 6)H _(d + 4)H , (d + 2)!! „ n d , 1C 

24- 16 2 3 + 60 2^-60-+ 15 


7r 


i—l i 


d/2 


c(2,d) 5 


Using above integrands, the Equation (42) for can be rewritten as under: 

s-d 


R(/"(x)) « (27r)~ a IC.I- 1 | (1 8) I d2 f 2 ' (C- 1 ) 04 !, + ( C(3 ’^,® ^ “ (c; 1 ) 010 


t2 


+ fM++M) K2 ' (c;1) ® 12 t 3 

+ (1 ® Id 2 ) (8 ( c (4, d) <8 Id' 2 )' ( C -I)® 8 t 6 \ 0 af>2 


d +2 


R(/ ,/ (x)) W 2~ d TT~ d ^ 2 IC' 1 ! 2 


r d 2 - 2d + 4 


(vecCx 1 ) 0 


+ + 130+++ + 225,7 - 20+tZ—!— - 300 (d + 2>!! 


c(3, d) <8 l,p 
3! 


vec 


2 3 2 2 4 
_ i\(8>5 / (d + 10)!! 


c;>r+ ^ 


2 « 


+ 225 - 30+±+l - 1380+++! - 13500 (++4+M 

+ 2 (+±l+_l 6 +±i>l! +6 o+^ — 60^ + 15 ‘ 


2 2 

, 01c (d + 6)!! , (d + 2)!! 

+ 315—^ + 2475—^ 

(vec C" 1 ^ 


(Id 2 <8 c(4, d) 8> 1^2) ; 


-1\®4 


4! 


(vec C x 0 


(44) 
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Using H 2 (/C) = 1 and R(JC) = 2 ~ d ir ~ d / 2 for d dimensional multivariate Gaussian 
kernel and above Equation (44) for [R(/"(x))]j the Equation (35) is obtained. The total 
roughness R(f"(x)) = R(/ // (x)) / <5f 2 . For better visulization, there is assumed whiten 
random vector with C x = 1^ or vecC x = <52- The further simplification is achieved as 
under: 


As, ABd = {d! <8> A)vecB 

(c (k,d) ®I d2 ) 02 ( vec C- 1 ) 0 ^ 2 = devec( d 4 d 2fc) ^vec(C~ 1 ) 0fc+2 j c(k,d )® 2 
=+ (^(c(k, d) <S> I^z) 02 ( vec C“ 1 ) 0fc+2 ^ o ^2 = c(fe, d) 02 8 f k vecC x = 62 ) 
Also, ^(I rf 2 <gi c(4, d) <8> Id?)' (vec C 0 ) ul j Sf 2 = c(4, d)'Sf 2 


where, devec is an operation converting a vetor into matrix as the dimensions specified. 


Then, the bandwidth selection rule in Equation (37) as under. 


hoc = (CIV) d + 4 

d 2 — 2d + 4 


where, C = 


(45) 


+ i +130 (^ + 225 . _ 20 (^ _ 300 (^) (hMpf! 

+ ^(™ + 315 q^ + 2475 (^ + 225 _ 

30^ _ 138 „(i±ii! _ 135 O0 (di+pr) 

+2 1 ((^ _ 16 (^ +60 (^ _ 60 4 + X 


Over all, the Gram-Charlier A Series based Extended Rule-of-Thumb for bandwidth 
parameter hoc selection in multivariate KDE is obtained. The derivation assumes near 
Gaussian unknown PDF and Gaussian kernel. With c(3, d) = 0, c(4, d) = 0, the Silverman’s 
Rule-of-Thumb is one case of the extended rule. 


For better visualization, the same equation for d = 2 and whiten random vector is 


given as under: 


*(/"(*)) = 

1 

47T 

«(/"(x)) = 

1 ■ 

47T - 


1 + 45 


'c(3 ,d)® 2 'Sf 3 \ + 225 / c(4,d) 02 'af \ +6 c(4,d)'df 


3! 2 


4! 


4! 
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With i?(/"(x)) from Equation (46), R(IC) = 2 d it d / 2 and = 1 for Gaussian kernel 


gives the required bandwidth parameter from Equation (37) is given as: 
hoc = {CN)—e 

1 if both c(3, d) = 0, c(4, d) = 0; 

1 + 0.3906c(4, d)® 2 ' + 0.25c(4, d)'Sf 2 if c(3 ,d) = 0; 


(47) 


where, C = < 


k 1 + 1.25c(3, d)® 2 '<5f 3 + 0.3906c(4, d)® 2 '<5f 4 + 0.25c(4, d)’df 2 otherwise 


8. ExROT for Bandwidth Selection in Kernel Density Derivative 
Estimator 

In general, the r th derivative of univariate or multivariate density /(x) using a v th order 
kernel is estimated as under: 

1 N 

f (r) ( x ) = jf K vk ( x - x *) ( 48 ) 

1=1 

where, usually a product kernel is used with whitening in all directions and H = hl^xd as 
defined under: 

-M „ \ _ TT 4 rfoi) ( x i i ~ x i 


*£h( x - x *) = n h K 


3 = 1 


h 


(49) 


The ( rj) th derivative corresponds to the fact that 


f {r \x) = 


d r f(x ) 


d ri x\d r2 X2 ■ ■ ■ d rd Xd 


The AMISE for density derivative as derived in (Henderson and Parmeter, 2012) and from 
1 is given as 

AMISE {/ (r) (x)J 


Equation 36 is given as under: 

M/C 


1 


h AMISE = 


,„, )2 j I*? (Hr + v,d)’ (If 6 2 )Ydx +m[s: R(>C^) (50) 

'(u !) 2 R(JC^) 


2u+2 r-\-d 


2v ^(K)R(f( r+v )(x)N 


(51) 


With this, the ExROT for gradient density can be derived and needs i?(k(r + v, d)) 
dehnition. With Gaussian kernel i.e. v = 2 and r = 1 the required parameter for 1- 
dimension can be derived as under. 

2 

dz 






a J 


4!a 7 


1 


115 1 fk 3 \ 2 10395 1 fk 4 \ 2 135135 1 /k 4 \ 945 

ct 6 ¥ + ct I 2 V3!/ 64 + { 4\ ) 2 7 + ^ V IT J "32" 


2 v / 7T(T 

hoc = cr(CN)~3 

where, C = + 4.5117-^ + 1.8329-^- + 2.4609^4 


a 


6 


a 


o 


14 


a 


6 


(52) 

(53) 
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Similarly, with Gaussian kernel i.e. v = 2, r = 1 and whiten components the bandwidth 
parameter for d-dimensional density derivative is derived as under: 


hoc = (CAT) <2+2+4 
where, C = 


(54) 


(d + 4)!! |0 d 6 (d + 2)!! 

™ + 31 ^ + 2475 (^ + 225 _ 


+ 


(d + m _ 1380 (d + m _ 1350 ^ ( c(3, ri) „ 2 
2 5 2 3 2 


+ 


(d + 12)!! 


2 7 


+ 651 


(d + 8)! 
2 5 


+ (105 2 + 4410) 


3! 2 

(d + 4)! 
2 3 


+ 105 2 - - 4620 


(d + 6)! 

2 4 


+ —2 * 105 


2 (d + 2)H _ 42 (^ + 10)!!^ (c(M)® 2 '5' 


2 2 


2 6 


4J2 


+ 


_ 24 (1±^ + 168 <l±ill _ 42( +±^ + 315 + 


For better visualization, the same equation for d = 2 and whiten random vector is given as 
under: 


1 


*(nx)) = ^ 


3 + 225 


{3,d)® 2 '6f 3 \ 201600 /c(4,d)® 2 ' 4 \ 864 c(4, d)'^f 2 


3! 2 


+ 


2 7 


4! 2 


+ 


32 


4! 


hoc = (CN) s 

where, (7 = 3 + 6.25c(3, d)® 2 '5f 3 + 2.7344c(4, d)® 2 '5f 4 + 2.25c(4, d)'Sf 2 


(55) 

(56) 


9. Conclusion and Future directions 

The article addresses the issue of bandwidth selection in KDE for both - univariate and 
multivariate. There has been proposed Gram-Charlier A Series based Extended Rule-of- 
Thumb (ExROT) on the assumption that the density being estimated is near Gaussian. 
The performance analysis of ExROT is done using standard test set for univariate density 
estimation. The results show that in all nongaussian unimodal density estimation cases 
- skewed or kurtotic or with outlier - ExROT has performed better than ROT. This is 
achieved at computation comparable to ROT and too small compare to the e-exact solve- 
the-equation plug-in rule. The ExROT has also outperformed ROT in some of the skewed 
multimodal density estimation - skewed bimodal, claw, Asymmetric claw. The Gram- 
Charlier A Series based ExROT for bandwidth selection is also obtained for multivariate 
KDE and multivariate density derivative estimations. 

The article serves as a particular demonstration to a more generalized class of band¬ 
width selection rules based on PDF approximations through infinite series. The PDF ap¬ 
proximation through infinite series expansion is a well established area and there exist many 
such approximations based on various reference PDFs. As the first results are encouraging, 
many such rules can be developed. 
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Appendix A. AMISE for bandwidth parameter selection in KDE 

Given N realizations of an unknown PDF f(x), the kernel density estimate f(x) is given by 

f( x '> = Nh'£ K {hr) = N^ Khix - x ‘> (57) 

i= 1 v 7 i=l 

where, K(u ) is the kernel function and h is the bandwidth parameter. Usually, K{u) is a 
symmetric, positive definite and bounded function; mostly a PDF; satisfying the following 
properties: 

/ oo poo poo 

K(u)du =1, / uK(u)Du = 0, / u 2 K(u)du = /i2 (K) < oo 

-oo J—oo J —oo 


The accuracy of a PDF estimation can be quantified by the available distance measures 
between PDFs; like; L\ norm based mean integrated absolute measure, L2 norm based 
mean integrated square error (MISE), Kullback-Libeler divergence and others. The optimal 
smoothing parameter (the bandwidth) h is obtained by minimizing the selected distance 
measure. The bandwidth selection rule based on the most widely used criteria MISE or 


IMSE (Integrated Mean Square Error) is derived as under (Silverman 1986 Wand and 


Jones 1994) 


ISE (f{x)J{x)) 
MISE(/(*),/(s)) 


L 2 (f{x)J(x)):= (f(x) - f{x)) 2 dx 


E{ISE(f(x),f(x))} = E 


(/(*) - fi x )) 2 dx 


' —OO 

r 00 


/ oo 

E{(f(x) - f(x)) 2 }dx = I MSE(/(*),/(*)) = IMSE(/(x),/(x)) 

-00 J — 00 

/ oo 

(- E{f(x )} - /(x)) 2 + E{(f(x) - E{f(x)}) 2 }dx 

-00 

/ oo 

Var (f(x))d: 

-OO 


' —OO 
poo 


Bias 


(58) 


Now, E{f(x)} 



•OO 


/ K(z)f(x 


hz)dz ( substituting z 


x—s \ 
h > 
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Expanding f{x — hz) using Taylor Series with the assumption of f(x) being a smooth PDF 
i.e. all derivatives exist 


r°° / h 2 7 2 \ 

E {f( x )} = j K ( z ) \Ji x ) ~ hzf'(x) + —^~f"{x) + 0(h 2 )J dz 


= /0) + y/^2 ( K )f"{x) + 0(h 2 ) 


Bias(/(x)) « —n 2 (K)f”(x) 

where, g, 2 (K) = f z 2 K 2 (z)dz, the second order central moment of the kernel. 


(59) 
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1 f 00 1 / A \ 2 

— J K 2 {z)f(x - hz)dz - — [f(x) + Bias (f(x 


f(s)ds 


(60) 


where, s is the mean of x and z = -V Now, expanding f(x — hz) using Taylor Series with 
assumption of f(x) being a smooth PDF i.e. all derivatives exist 

- 1 r°° ( h 2 z 2 \ 

Var {f{x)) = — J I< 2 {z) [f(x) - hzf\x ) + -^-/"(®) + 0(h 2 )J dz 

(/(®) + + 0(h 2 )^j 

=> Var (f(x)) ~ ~Nh^ X ^ I ^ 2 ( z )d z ('•' assuming large N , small h ) (61) 


Combining equations (58), (59) and (61) 


MISE(/(®)) = ^ 2 (K)) 2 R(n + ^R(K) + 0(h 4 ) + O (j^J 

where, R(f") = f (f'(x)) 2 dx and R(K) = f K 2 (z)dz. In general, R{g) = / g 2 (z)dz is 
identified as the roughness of function g(x). An asymptotic large sample approximation 
AMISE is obtained, assuming lini/v-^oo h = 0 and linpv^oo Nh = oo i.e. h reduces to 0 at a 
rate slower than 1/N. 


AMISE (/(a;)) = ^(/r 2 (K)) 2 R(f '") + ±R(K ) 


(62) 
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The Equation (62) interprets that a small h increases estimation variance, whereas, a larger 
h increases estimation bias. An optimal h minimizes the total AMISE(/(x)). So, 


J- AMISE(f(x )) = h 3 ( f i 2 (K)) 2 R(f")- 1 ^R(K) = 0 

( R(k) . , 

^kAMWE = ^ 2(K)2 R{ f„ )N ) ( 63 ) 

Thus, the optimal bandwidth parameter depends upon some of the kernel parameters, 
number of samples and the second derivative of the actual PDF being estimated. 


Appendix B. Kronecker Product and K-Derivative Operator 


The section briefs preliminary requisites on the Kronecker product and its repeated appli¬ 
cations to the vector differential operator. More details can be found in McCullagh (1987); 
Terdik (2002); S. Rao Jammalamadaka and Terdik| (May, 2006). 


Definition 1 (Kronecker Product Operator (®)) The Kronecker Product Operator ((g)) 
between matrices A with size p x q and B with size m X n is defined as: 


A® B 


anB ai 2 B ai q B 

a 2 iB a 22 B • • • a 2g B 


a p iB a p2 B • • • o P qB 


The resultant matrix is of dimension pm x qn. 

Let A is with size p x 1 and B is with size m X 1, ther0 A ® B' is a matrix 
with size p x m. A ® A is symbolically represented as A® 2 and has size p 2 x 1. In general, 
A® A®...® A (n times) is symbolically represented as A® n and has size p n x 1. Accordingly, 
A 0n B has size p n m x 1. 


Definition 2 (Jacobian Matrix) Let A = (Ai, A 2 ,..., \d)', A G and f(A) = 

(/i(A), / 2 (A),..., f m (W £ be a differentiable m-component vector function. Then, 
Jacobian matrix (J) of f (J(f)) is an m x d matrix defined as under: 


J (f(A)) = K 


df df df 

dAi' dA 2 ’ " ' ’ dA d 


dfi 

dh .. 

. dh 

d\i 

8X2 

d\ d 

dh 



d\\ 



dfm 

8fm m t 

8fm 

d\i 

8X2 

8X d 


Now, if the vector differential operator is defined as a column vector D^ = 
then the Jacobian matrix can be re-written as: 


(J)_ _d_ _d_ 

. 9 X d 


/ 


J(f (A)) = D A (f) = f (A)D' a = (MX), f 2 ( A),..., fm(X)) 1 


2. The symbol ’ stands for Transpose of a matrix 
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This implies that to match the definition of derivative from matrix calculus, the vector 
differential operator should be applied from right to the left. This is also to fulfill the 
requirement that a new defin So, applying vector derivative operator from right to the left, 
has been kept as a rule throughout the article. 

Definition 3 (The K-derivative Operator) Let A = (Ai, A 2 , -.., A f)', A G R d , the vec¬ 
tor differential operator D A = (gf^, gf^,..., gf-) and f(A) = (/i(A), / 2 (A),..., f m ( A))' € 
M m be a differentiable m-component vector function. Then, the Kronecker product between 
D A and f(A) is given as under: 

r dh dfi ... dh 1 / 

9 Ai 8X2 8Xd 

df2 ■. : 

Df f(A) = Fee 9Al 

dfm dfm . . . dfm 

< 9 Ai 8X2 dXj 

^Dff(A) = Cec(")' = Kec(^f') 

where, the Fee operator converts m x d matrix into an md x 1 column vector by stacking 
the columns one after an other. The operator is called Kronecker derivative operator 
or simply, K-derivative operator. 

Thus, the Kronecker product with the vector differential operator, obtains vectorization of 
the transposed Jacobian of a vector function. Corresponding to the definition, the k-th 
order differentiation is given by: 

r f) o "I f 

Off = D® (Df-f) = [/.(A), A (A),.... UWf ® [ Wt , ^--■ . ^ 

The Df fc f is a column vector of dimension md k x 1. 

Some important properties of the K-derivative operator are listed below. 

Definition 4 (Scaling Property) Let A = (Ai, A 2 ,..., XdY, A G R d , f(A) € M m and 
fx(A) = Af(A), where A is an n x m matrix. Then 

D®(f 1 ) = (A®I d )D®(f) 

where, 1^ is a d- dimensional unit matrix. 

Definition 5 (Chain Rule) Let A = (Ai, A 2 ,..., A^)', A € R d , f(A) G M mi and g(A) G 
M" 12 . Then 

D A ( f ® g) = K 34 2 (^i,m 2 ,d)((Dff) ®g) + f ® (Dfg) 

where, K 3 0 . 2 (mi, m 2 , d) is a commutation matrix of size mim 2 d x mim 2 d that changes the 
order of the the Kronecker product components. For example, 

K 3u2 (mi,m 2 , d)(ai <8> a 2 a 3 ) = ai <8> a 3 <8> a 2 
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The application of Scaling property gives a following simplification: 


D 0 A ® fc= ^ Kj - +1++fc (d [fc] )j 

where, o?^ = [d, d,..., d\'^ kxl y The repeated applications of chain rule gives a following 

simplification: 

D 0r a /0fc A 0fc = k ( k _iy.. (jfe _ r + 1 ) 

where, a is a vector of constants. 
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